15 research outputs found
Spectral Methods for Correlated Topic Models
In this paper, we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are generated through the process of normalizing a family of independent Infinitely Divisible (ID) random variables. The Dirichlet distribution is a special case obtained by normalizing a set of Gamma random variables. We prove that this flexible topic model class can be learned via spectral methods using only moments up to the third order, with (low order) polynomial sample and computational complexity. The proof is based on a key new technique derived here that allows us to diagonalize the moments of the NID distribution through an efficient procedure that requires evaluating only univariate integrals, despite the fact that we are handling high dimensional multivariate moments. In order to assess the performance of our proposed Latent NID topic model, we use two real datasets of articles collected from New York Times and Pubmed. Our experiments yield improved perplexity on both datasets compared with the baseline
Combining Symbolic Expressions and Black-box Function Evaluations in Neural Programs
Neural programming involves training neural networks to learn programs,
mathematics, or logic from data. Previous works have failed to achieve good
generalization performance, especially on problems and programs with high
complexity or on large domains. This is because they mostly rely either on
black-box function evaluations that do not capture the structure of the
program, or on detailed execution traces that are expensive to obtain, and
hence the training data has poor coverage of the domain under consideration. We
present a novel framework that utilizes black-box function evaluations, in
conjunction with symbolic expressions that define relationships between the
given functions. We employ tree LSTMs to incorporate the structure of the
symbolic expression trees. We use tree encoding for numbers present in function
evaluation data, based on their decimal representation. We present an
evaluation benchmark for this task to demonstrate our proposed model combines
symbolic reasoning and function evaluation in a fruitful manner, obtaining high
accuracies in our experiments. Our framework generalizes significantly better
to expressions of higher depth and is able to fill partial equations with valid
completions.Comment: Published as a conference paper at the sixth International Conference
on Learning Representations (ICLR), 201
Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]
Scalable probabilistic modeling and prediction in high dimensional
multivariate time-series is a challenging problem, particularly for systems
with hidden sources of dependence and/or homogeneity. Examples of such problems
include dynamic social networks with co-evolving nodes and edges and dynamic
student learning in online courses. Here, we address these problems through the
discovery of hierarchical latent groups. We introduce a family of Conditional
Latent Tree Models (CLTM), in which tree-structured latent variables
incorporate the unknown groups. The latent tree itself is conditioned on
observed covariates such as seasonality, historical activity, and node
attributes. We propose a statistically efficient framework for learning both
the hierarchical tree structure and the parameters of the CLTM. We demonstrate
competitive performance in multiple real world datasets from different domains.
These include a dataset on students' attempts at answering questions in a
psychology MOOC, Twitter users participating in an emergency management
discussion and interacting with one another, and windsurfers interacting on a
beach in Southern California. In addition, our modeling framework provides
valuable and interpretable information about the hidden group structures and
their effect on the evolution of the time series
Dividing and conquering a BlackBox to a mixture of interpretable models: route, interpret, repeat
7R01HL141813-06 - NIH/National Heart, Lung, and Blood Institute; Optum Labs, Inc.; NIH/National Institutes of HealthAccepted manuscrip
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
ML model design either starts with an interpretable model or a Blackbox and
explains it post hoc. Blackbox models are flexible but difficult to explain,
while interpretable models are inherently explainable. Yet, interpretable
models require extensive ML knowledge and tend to be less flexible and
underperforming than their Blackbox variants. This paper aims to blur the
distinction between a post hoc explanation of a Blackbox and constructing
interpretable models. Beginning with a Blackbox, we iteratively carve out a
mixture of interpretable experts (MoIE) and a residual network. Each
interpretable model specializes in a subset of samples and explains them using
First Order Logic (FOL), providing basic reasoning on concepts from the
Blackbox. We route the remaining samples through a flexible residual. We repeat
the method on the residual network until all the interpretable models explain
the desired proportion of data. Our extensive experiments show that our route,
interpret, and repeat approach (1) identifies a diverse set of
instance-specific concepts with high concept completeness via MoIE without
compromising in performance, (2) identifies the relatively ``harder'' samples
to explain via residuals, (3) outperforms the interpretable by-design models by
significant margins during test-time interventions, and (4) fixes the shortcut
learned by the original Blackbox. The code for MoIE is publicly available at:
https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.Comment: ICML, 202
Recommended from our members
Learning Latent Hierarchical Structures via Probabilistic Models and Deep Learning
Hierarchical structures arise in many real world applications and domains. For example, in social networks people’s relationships and the groups to which they belong form a hierarchy. In natural language and computer programs, parse trees (which have a hierarchical structure) are used to represent the compositionality of expressions. These hierarchies strongly affect the statistics and the behavior of the data. Hence, it is important to develop algorithms that take these structures into account when modeling such data. Apart from these hierarchical structures, some datasets are best explained with hierarchical models even though there is no apparent hierarchy in the data itself. For instance when modeling the occurrence of words in a document, it is more realistic to assume that the words are drawn in a hierarchical manner from a topic distribution rather than independently from a single topic. In this dissertation, we focus on capturing these hierarchies and leveraging them for modeling high dimensional datasets.Hierarchical structures underlying the data are either observed or latent. For example in the context of computer programs, the syntax tree is inherent to the program and is therefore observed. On the other hand, the statistical dependence of a social network’s users is latent. In this dissertation, we study both types of hierarchies and develop models under both struc- tures because they both arise in many applications and are equally important. Nevertheless, capturing latent hierarchical structures is more challenging. We develop novel probabilistic models to capture latent hierarchies and present statistically efficient and provably consistent parameter learning algorithms for them. When capturing observed hierarchical structures we develop deep learning models that learn low-dimensional continuous representations for the discrete symbols and variables